Source Code PDF Print E-mail

Here are some of software packages developed / managed by Yan Han

 

Software Packages for ProQuest/UMI new Electronic Theses and Dissertations (ETD)

1. Software Package for ProQuest/UMI ETD DTD 4.3 (After July 1, 2008, ProQuest's new delivery platform)

 As ProQuest/UMI switched its delivery platform for Electronic Theses and Dissertations(ETD), I have developed a small software package to process ETD. The software package does:

  • Unzip ProQuest/UMI ETD delivery Zipped files, and create one directory per ETD.
  • Rename these ETDs into other preferred file names (in my case, Wang_arizona_0009D_10075.xml --> azu_etd_10075_sip1_m.xml)
  • Generate digital signature for digital preservation.
  • Create MARC records from ProQuest/UMI XML files.  (i.e. a MRK file will be generated for direct loading to catalog. I load MRC file to innovative and Koha)
  • Create embargo notification and moving embargo ETDs to a different directory for future loading

This package can save time for processing hundreds of ETDs. The package has a Java compiled code (class file) and Perl Scripts.Currently I run it on Linux, but it can be run in Windows.

  • ProQuest/UMI original file pattern:  zipped file: etdadmin_upload_[0-9].zip (e.g. etdadmin_upload_1242.zip), which can be unzipped to:
      1. PDF: [authorLastName]_[institutionName]_0009[D|M]_[0-9].pdf. (e.g. Zhu_arizona_0009D_10020.pdf)
      2. XML: [authorLastName]_[institutionName]_0009[D|M]_[0-9]_DATA.xml (e.g. Zhu_arizona_0009D_10020_DATA.xml)
      3. A directory: [authorLastName]_[institutionName]_0009[D|M]_63/ (e.g. Zhu_arizona_0009D_63/): This directory is empty, which is used for optional files (I guess).
      4. Assumption: unique id should be 10020, but the directory does not have this unique id.
    • Example:
      [hany@yanbox test]$ unzip etdadmin_upload_3002.zip
      Archive:  etdadmin_upload_3002.zip
        inflating: Choi_arizona_0009D_10081.pdf 
        creating: Choi_arizona_0009D_63/
        inflating: Choi_arizona_0009D_10081_DATA.xml 

 Download package etd_tool_for_proquest_dtd_4_3_20090326.tar now (3 MB). Please read README first!

2. Software Package for ProQuest/UMI ETD DTD 4.2 below (Before July 1, 2008, using BePress delivery platform)

This package was used for processing ProQuest's previous delivery platform using BePress.

 

BePress original file pattern: zipped file:upload-[0-9].zip (e.g. upload-3025.zip), which can be unzipped to:

  1.  One or more directory: [0-9]/
    1. PDF: umi-[institutionName]-[0-9].pdf
    2. XML: umi-[instituionName]-[0-9].xml
    3. May include a direcotry for optional files
  2. Example:
 [hany@yanbox test1]$ unzip upload-3299.zip
Archive:  upload-3299.zip
   creating: 2022/
  inflating: 2022/umi-arizona-2022.pdf  
  inflating: 2022/umi-arizona-2022.xml  
   creating: 2028/
  inflating: 2028/umi-arizona-2028.pdf  
  inflating: 2028/umi-arizona-2028.xml  
   creating: 2045/
  inflating: 2045/umi-arizona-2045.pdf  
  inflating: 2045/umi-arizona-2045.xml  

Download package etd_tool_for_bepress.tar now (3 MB). Please read README first!

 

Last Updated ( Monday, 06 July 2009 )
 
< Prev   Next >

Site maintained by University of Arizona Libraries


 

The information provided on this Web site is
not official U.S. Government information and does not represent
the views or positions of the U.S. Agency for International Development
or the U.S. Government.