StarQuest Technical Documents
SQDR Plus: Oracle XStream Diagnostics
Last Update:13 September 2022
      Product: SQDR & SQDR Plus 
      Version: 5.30 and later 
      Article ID: SQV00PL072
Use the script xstream_diagnostics.sql to gather some diagnostic details about XStream on your Oracle source.
The queries are to be run in the root container as the XStream admin user (usually c##xstrmadmin in a pdb environment, xstrmadmin in a non-pdb environment.)
Please provide the output of this script along with tier2 support logs when you have an issue for StarQuest support to investigate.
This script (or the individual queries) can be run easily in SQL*Plus or SQL Developer. When running individual queries, run this first to make sure you get full date/time details
ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';
Here are the more details about some of the queries.:
Capture details including latency
SELECT CAPTURE_NAME,
      STATE,
      ((SYSDATE - CAPTURE_MESSAGE_CREATE_TIME)*86400) LATENCY_SECONDS,
      ((SYSDATE - CAPTURE_TIME)*86400) LAST_STATUS,
      TO_CHAR(CAPTURE_TIME, 'HH24:MI:SS MM/DD/YY') CAPTURE_TIME, 
      TO_CHAR(CAPTURE_MESSAGE_CREATE_TIME, 'HH24:MI:SS MM/DD/YY') CREATE_TIME
      FROM gV$XSTREAM_CAPTURE;
We can see from the output below that the XStream capture has a latency of 421465 seconds (117 hours).
      The reason for this latency needs to be investigated. 
      In this case, the XStream agent was stopped for several days and upon startup the oldest messages were 117 hours old.
CAPTURE_NAME	STATE	LATENCY_SECONDS	LAST_STATUS	CAPTURE_TIME	CREATE_TIME
      CAP$_RAC1PHX1NK_1	WAITING FOR TRANSACTION	421465	0	09:24:14 08/24/22	12:19:49 08/19/22
Current position of XStream Outbound Server
SELECT SERVER_NAME,
      SOURCE_DATABASE,
      PROCESSED_LOW_POSITION,
      TO_CHAR(PROCESSED_LOW_TIME,'HH24:MI:SS MM/DD/YY') PROCESSED_LOW_TIME
      FROM ALL_XSTREAM_OUTBOUND_PROGRESS;
Here we can see the processed_low_position for the XStream Outbound Server. This can tell us low watermark transacton which in turn can help us understand if the server is caught up and is updating its position regularly. This query also confirms the Outbound SERVER_NAME and the source database name.
SERVER_NAME	SOURCE_DATABASE	PROCESSED_LOW_POSITION	PROCESSED_LOW_TIME
      RAC1PHX1NK	RAC1_PDB1.SUB08091883250.VCN1.ORACLEVCN.COM	00000000052E80A2000000010000000100000000052E80A2000000010000000102	09:26:42 08/24/22
State of Capture
SELECT CAPTURE_NAME,
      STATE,
      TOTAL_MESSAGES_CAPTURED,
      TOTAL_MESSAGES_ENQUEUED 
      FROM gV$XSTREAM_CAPTURE;
          
      This query gives us important information about the state of the capture. Possible states include:
WAITING FOR REDO - This state shows the Outbound Server is waiting on redo logs to continue its work. If redo/archivelogs are missing then the Outbound Server (and StarQuest agent) will wait, without throwing errors until the missing redo is available
WAITING FOR REDO: FILE /rdsdbdata/db/JDEPROD_A/arch/redolog-635566-1-1069256902.arc, THREAD 1, SEQUENCE 635566, SCN 0x0000000c9e375c6a
CAPTURING CHANGES
PAUSED FOR FLOW CONTROL - This state means that the capture is unable to enqueue log entries either because of low memory or because propagations and outbound servers are consuming messages slower than the capture process is creating them. This state indicates flow control that is used to reduce spilling of captured LCRs when propagation or apply has fallen behind or is unavailable.
WAITING FOR TRANSACTION - This state shows the capture is waiting for LogMiner to provide more transactions.
The output of the query also shows number of messages capture and enqueued since the capure process was last started. Check that these numbers are increasing on a non-idle system.
CAPTURE_NAME	STATE	TOTAL_MESSAGES_CAPTURED	TOTAL_MESSAGES_ENQUEUED
      CAP$_RAC1PHX1NK_1	WAITING FOR TRANSACTION	1157829	423406
Memory used by Apply
SELECT APPLY_NAME AS APP,
      SGA_USED/(1024*1024) AS USED,
      SGA_ALLOCATED/(1024*1024) AS ALLOCATED,
      TOTAL_MESSAGES_DEQUEUED AS DEQUEUED,
      TOTAL_MESSAGES_SPILLED AS SPILLED
      FROM gV$XSTREAM_APPLY_READER;
This query shows important information about memory used and allocated by the apply process along with number of messages dequeued and spilled. A large number of spilled messages may affect performance and may require some tuning of parameters of the apply.
APP	USED	ALLOCATED	DEQUEUED	SPILLED
      RAC1PHX1NK	421.83728790283203125	435.55146026611328125	423078	0
Streams Pool memory usage
select * from gv$sgastat where pool like '%streams%' and BYTES > 1000 order by BYTES desc;
This query gives us more details on streams pool usage. The important components to examine are free memory in the streams pool and how much message is being used by 'apply shared t'. This can confirm low memory conditions along with the previous query about the apply above.
INST_ID	POOL	NAME	BYTES	CON_ID
      1	streams pool	free memory	5942242160	0
      1	streams pool	apply shared t	459312448	1
      1	streams pool	Logminer LCR c	25012048	1
      1	streams pool	capture shared	15850752	1
      1	streams pool	capture shared sga	6272	1
      1	streams pool	apply shared transactions	6272	1
      1	streams pool	recov_kgqbtctx	5952	1
      1	streams pool	memory_knlso	4200	1
      1	streams pool	Sender info	3128	1
      1	streams pool	KGH: NO ACCESS	1920	1
      1	streams pool	krvxhds	1792	1
      1	streams pool	kwqbcqini:spilledovermsgs	1584	1
Replication events including errors
select * FROM ALL_REPLICATION_PROCESS_EVENTS order by EVENT_TIME desc
This view can reveal details about replication events involving XStream capture and apply. Columns for error and error message will reveal ORA- errors that have been raised.
      If the query returns too many rows you can limit to a specific time frame (where EVENT_TIME > sysdate -3) or only look at errors (where ERROR_NUMBER is not null.
      Below we can see output when an Outbound Server was stopped and the Apply then gave an error that it was disabled and then stopped then started again successfully. 
STREAMS_TYPE	PROCESS_TYPE	STREAMS_NAME	EVENT_NAME	DESCRIPTION	EVENT_TIME	ERROR_NUMBER	ERROR_MESSAGE
      XSTREAM	APPLY SERVER	RAC1PHX1NK	START	SUCCESS	19-AUG-22 12.46.05.824735000 PM 
      XSTREAM	APPLY READER	RAC1PHX1NK	START	SUCCESS	19-AUG-22 12.46.05.814012000 PM 
      XSTREAM	APPLY COORDINATOR	RAC1PHX1NK	START	SUCCESS	19-AUG-22 12.46.05.785575000 PM 
      XSTREAM	APPLY READER	RAC1PHX1NK	STOP	SUCCESS	19-AUG-22 12.46.05.430969000 PM 
      XSTREAM	APPLY SERVER	RAC1PHX1NK	STOP	SUCCESS	19-AUG-22 12.46.05.430969000 PM 
      XSTREAM	APPLY SERVER	RAC1PHX1NK	ABORT	ERROR	19-AUG-22 12.46.01.895823000 PM	26920	ORA-26920:  outbound server "" has been stopped.
Additional diagnostics
For more detailed information, we also suggest the script in the Oracle Support note
      Streams Configuration Report and Health Check Script (Doc ID 273674.1)
Though the document refers to Oracle 12, the script runs on later versions as well. Also, if you have a RAC (Real Application Cluster) system, run the Streams Health Check Script on each node of the Oracle RAC database. This script needs to be run with a user that has sysdba privileges.
Oracle documentation
 Monitoring XStream Out (Oracle 11.2)
        Monitoring XStream Out (Oracle 12.2)
      Monitoring XStream Out (Oracle 19.1)
    
DISCLAIMER
The information in technical documents comes without any warranty or applicability for a specific purpose. The author(s) or distributor(s) will not accept responsibility for any damage incurred directly or indirectly through use of the information contained in these documents. The instructions may need to be modified to be appropriate for the hardware and software that has been installed and configured within a particular organization. The information in technical documents should be considered only as an example and may include information from various sources, including IBM, Microsoft, and other organizations.
 
	