Merge XML files
ApplyMergeXSLT is a Java application that allows merging a set of UTF-8 XML files having the same root
element.
The application presents the following file system structure:
-- ApplyMergeXSLT
-- bin
-- lib
-- src
-- X_OUTPUT
-- X_SOURCE
-- xslt
-- run.bat
-- run_wrapper.bat
The directory X_SOURCE must contain the input XML files we are going to
merge. It can contain directly the
source XML files,
or a directories tree whose leaves are the XML files (i.e. the procedure is recursive in the file
system).
The two .bat files are the Windows access point to the application
(it is easy to write a .sh executable file to work on a Unix system). ”run.bat” should be used when we want in output an XML file
that merges the input XMLs maintaining the common root element. For example:
File1.xml
<Publisher>
<Journal>
<JournalInfo>
<JournalElectronicISSN>2080-2218</JournalElectronicISSN>
<JournalTitle>Advances in Cell Biology</JournalTitle>
</JournalInfo>
</Journal>
</Publisher>
File2.xml
<Publisher>
<Journal>
<JournalInfo>
<JournalPrintISSN>0004-1254</JournalPrintISSN>
<JournalTitle>Archives of Industrial Hygiene</JournalTitle>
</JournalInfo>
</Journal>
</Publisher>
merged.xml
<Publisher>
<Journal>
<JournalInfo>
<JournalElectronicISSN>2080-2218</JournalElectronicISSN>
<JournalTitle>Advances in Cell Biology</JournalTitle>
</JournalInfo>
</Journal>
<Journal>
<JournalInfo>
<JournalPrintISSN>0004-1254</JournalPrintISSN>
<JournalTitle>Archives of Industrial Hygiene</JournalTitle>
</JournalInfo>
</Journal>
</Publisher>
The other executable file, “run_wrappper.bat”, should be used when we want
to maintain the root element, but we want also to wrap each source document
into a wrapper element (for example because under the root of each XML there are more sons). As an
example:
File1.xml
<Publisher>
<PublisherInfo>
<PublisherName>Birkhäuser-Verlag</PublisherName>
</PublisherInfo>
<Journal OutputMedium="All">
<JournalInfo JournalProductType="ArchiveJournal">
<JournalPrintISSN>0004-069X</JournalPrintISSN>
<JournalElectronicISSN>1661-4917</JournalElectronicISSN>
<JournalTitle>Archivum Immunologiae</JournalTitle>
</JournalInfo>
</Journal>
</Publisher>
File2.xml
<Publisher>
<PublisherInfo>
<PublisherName>Birkhäuser-Verlag</PublisherName>
</PublisherInfo>
<Journal OutputMedium="All">
<JournalInfo JournalProductType="ArchiveJournal">
<JournalPrintISSN>0004-069X</JournalPrintISSN>
<JournalElectronicISSN>1661-4917</JournalElectronicISSN>
<JournalTitle>Archivum Immunologiae</JournalTitle>
</JournalInfo>
</Journal>
</Publisher>
merged.xml
<Publisher>
<wrapper>
<PublisherInfo>
<PublisherName>Birkhäuser-Verlag</PublisherName>
</PublisherInfo>
<Journal OutputMedium="All">
<JournalInfo JournalProductType="ArchiveJournal">
<JournalPrintISSN>0004-069X</JournalPrintISSN>
<JournalElectronicISSN>1661-4917</JournalElectronicISSN>
<JournalTitle>Archivum Immunologiae</JournalTitle>
</JournalInfo>
</Journal>
</wrapper>
<wrapper>
<PublisherInfo>
<PublisherName>Birkhäuser-Verlag</PublisherName>
</PublisherInfo>
<Journal OutputMedium="All">
<JournalInfo JournalProductType="ArchiveJournal">
<JournalPrintISSN>0004-069X</JournalPrintISSN>
<JournalElectronicISSN>1661-4917</JournalElectronicISSN>
<JournalTitle>Archivum Immunologiae</JournalTitle>
</JournalInfo>
</Journal>
</wrapper>
</Publisher>
Take Note: the output file, “merged.xml”, is into the directory X_OUTPUT. Moreover, it
loses any DTD declarations.