Uploaded image for project: 'SIdora'
  1. SIdora
  2. SID-125

Observation Resource Indexing

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • 0.4
    • None
    • None
    • None
    • 119

      The question may be, if we need the CSV datastream stored. I think that the user needs for getting CSV streams out can be satisfied by generating the CSV from the result of a risearch query, using an xslt transformation. Thus any subset of observations can be delivered as CSV. I will make an example of an xslt to generate the CSV.

      Further, the potential foxml can not contain structured RELS-EXT entries, so I think there needs to be one foxml per image with its metadata.

      Gert

      On 16/07/2013, at 17.44, Hua, Dong wrote:

      Two issues need to be confirmed:

      1. Besides the CSV datastream, we need to put all entries duplicated from the CSV datastream into RELS-EXT to support joins and subset selection based on the proposed format
      2. When CSV is updated, the RELS-EXT need to be updated automatically

      As for the foxml format, I just updated the potential foxml for handling multiple observation entries. Is it OK?

      Here is a updated potential foxml for this observation:

      <?xml version="1.0" encoding="UTF-8"?>
      <foxml:digitalObject VERSION="1.1" PID="si:Thai_Cam_Deploy_80_Thai_IMG_186"
      xmlns:foxml="info:fedora/fedora-system:def/foxml#"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd">
      <foxml:objectProperties>
      <foxml:property NAME="info:fedora/fedora-system:def/model#state" VALUE="Active"/>
      <foxml:property NAME="info:fedora/fedora-system:def/model#label" VALUE="Thai_Cam_Deploy_80 Thai_IMG_186"/>
      <foxml:property NAME="info:fedora/fedora-system:def/model#ownerId" VALUE=" "/>
      </foxml:objectProperties>
      <foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
      <foxml:datastreamVersion ID="RELS-EXT.0" LABEL="RDF Statements about this Object" CREATED="2012-12-13T14:21:22.017Z"MIMETYPE="application/rdf+xml" FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0">
      <foxml:xmlContent>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:fedora-model="info:fedora/fedora-system:def/model#"
      xmlns:obs="http://localhost/obs/">
      <rdf:Description rdf:about="info:fedora/si:Thai_Cam_Deploy_80_Thai_IMG_186">
      <fedora-model:hasModel rdf:resource="info:fedora/si:observationCModel"></fedora-model:hasModel>
      <obs:series>
      <obs:record>
      <obs:Deploy_ID>Thai_Cam_Deploy_80</obs:Deploy_ID>
      <obs:Sequence_ID>Thai_IMG_186</obs:Sequence_ID>
      <obs:Begin_Time>2004-03-19 17:12:00.000</obs:Begin_Time>
      <obs:End_Time>2004-03-19 17:12:00.000</obs:End_Time>
      <obs:Species_Name>Helarctos malayanus</obs:Species_Name>
      <obs:Common_Name>Sun Bear</obs:Common_Name>
      </obs:record>
      <obs:record>
      <obs:Deploy_ID>Thai_Cam_Deploy_81</obs:Deploy_ID>
      <obs:Sequence_ID>Thai_IMG_186</obs:Sequence_ID>
      <obs:Begin_Time>2004-03-19 17:12:00.000</obs:Begin_Time>
      <obs:End_Time>2004-03-19 17:12:00.000</obs:End_Time>
      <obs:Species_Name>Helarctos malayanus</obs:Species_Name>
      <obs:Common_Name>Sun Bear</obs:Common_Name>
      </obs:record>
      </obs:series>

      </rdf:Description>
      </rdf:RDF>
      </foxml:xmlContent>
      </foxml:datastreamVersion>
      </foxml:datastream>
      </foxml:digitalObject>

      Here is example part of an observation table:

      Deploy ID, Sequence ID, Begin Time, End Time, Species Name, Common Name, Age, Sex, Individual, Count
      Thai_Cam_Deploy_80,Thai_IMG_186,2004-03-19 17:12:00.000,2004-03-19 17:12:00.000,Helarctos malayanus ,Sun Bear,null,null,,
      Thai_Cam_Deploy_81,Thai_IMG_186,2004-03-19 17:12:00.000,2004-03-19 17:12:00.000,Helarctos malayanus ,Sun Bear,null,null,,

      Dong

      Hi Beth,

      Thank you. The xml schema is important, and I have no problem with it by itself. However, although it will allow indexing in solr, and thereby a lot of search possibilities, there is a problem, in that solr search does not facilitate joins, like in a relational database with sql, or like in a fedora risearch based on RELS-EXT.

      Let me give an example of what I want to do:

      Here is part of an observation table:

      Deploy ID, Sequence ID, Begin Time, End Time, Species Name, Common Name, Age, Sex, Individual, Count
      Thai_Cam_Deploy_80,Thai_IMG_186,2004-03-19 17:12:00.000,2004-03-19 17:12:00.000,Helarctos malayanus ,Sun Bear,null,null,,

      Here is a potential foxml for this observation:

      <?xml version="1.0" encoding="UTF-8"?>
      <foxml:digitalObject VERSION="1.1" PID="si:Thai_Cam_Deploy_80_Thai_IMG_186"
      xmlns:foxml="info:fedora/fedora-system:def/foxml#"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="info:fedora/fedora-system:def/foxml# http://www.fedora.info/definitions/1/0/foxml1-1.xsd">
      <foxml:objectProperties>
      <foxml:property NAME="info:fedora/fedora-system:def/model#state" VALUE="Active"/>
      <foxml:property NAME="info:fedora/fedora-system:def/model#label" VALUE="Thai_Cam_Deploy_80 Thai_IMG_186"/>
      <foxml:property NAME="info:fedora/fedora-system:def/model#ownerId" VALUE=" "/>
      </foxml:objectProperties>
      <foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
      <foxml:datastreamVersion ID="RELS-EXT.0" LABEL="RDF Statements about this Object" CREATED="2012-12-13T14:21:22.017Z"MIMETYPE="application/rdf+xml" FORMAT_URI="info:fedora/fedora-system:FedoraRELSExt-1.0">
      <foxml:xmlContent>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"xmlns:fedora="info:fedora/fedora-system:def/relations-external#" xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:fedora-model="info:fedora/fedora-system:def/model#"
      xmlns:obs="http://localhost/obs/">
      <rdf:Description rdf:about="info:fedora/si:Thai_Cam_Deploy_80_Thai_IMG_186">
      <fedora-model:hasModel rdf:resource="info:fedora/si:observationCModel"></fedora-model:hasModel>
      <obs:Deploy_ID>Thai_Cam_Deploy_80</obs:Deploy_ID>
      <obs:Sequence_ID>Thai_IMG_186</obs:Sequence_ID>
      <obs:Begin_Time>2004-03-19 17:12:00.000</obs:Begin_Time>
      <obs:End_Time>2004-03-19 17:12:00.000</obs:End_Time>
      <obs:Species_Name>Helarctos malayanus</obs:Species_Name>
      <obs:Common_Name>Sun Bear</obs:Common_Name>
      </rdf:Description>
      </rdf:RDF>
      </foxml:xmlContent>
      </foxml:datastreamVersion>
      </foxml:datastream>
      </foxml:digitalObject>

      Now I may do a risearch query like this:

      select $speciesname $cameratrap $cameratraptitle $ctplot $ctplottitle $project $projecttitle from <#ri>
      where
      $obs <http:localhost/obs/Species_Name> $speciesname and
      $obs <http:localhost/obs/Deploy_ID> $cameratraptitle and
      $cameratrap <http://purl.org/dc/elements/1.1/title> $cameratraptitle and
      $cameratrap <info:fedora/fedora-system:def/model#hasModel> <info:fedora/si:cameraTrapCModel> and
      $cameratrap <info:fedora/fedora-system:def/relations-external#isMemberOfCollection> $ctplot and
      $ctplot <http://purl.org/dc/elements/1.1/title> $ctplottitle and
      $ctplot <info:fedora/fedora-system:def/model#hasModel> <info:fedora/si:ctPlotCModel> and
      $ctplot <info:fedora/fedora-system:def/relations-external#isMemberOfCollection> $project and
      $project <http://purl.org/dc/elements/1.1/title> $projecttitle and
      $project <info:fedora/fedora-system:def/model#hasModel> <info:fedora/si:projectCModel>

      I think such queries are necessary in order to satisfy all the potential needs for selections of subsets of observations.

      I am curious to know, if you have been into such considerations, and if you may agree or disagree, or maybe you have questions to this?

      Best,
      Gert

      Hi Gert,

      The observation objects are derived from a combination of image sequence, image, animal and individual animal. The camera trap researchers identify animals based on a sequence of images. In the logical model they also use a count to indicate how many of a specific species were identified in the sequence. Instead of using count in our observation objects, we are repeating the observation for the number of species identified, so if there were three individual dogs identified in a sequence of 10 pictures, we would have three lines in the observation file with the species of dog. One of the dogs may be a recurring animal to the camera, so in that case the researcher may have a name and notes, that is why some observations will have an individual animal id, name and notes. In the case of some data, there will be a researcher observation file and a volunteer observation file.

      Dong and I worked briefly, before I left for Open Repositories, on coming up with an xml schema to represent the observation files. It is attached. I have also cc’d Dong and Thorny in case they have input. Dong will generate the observation files and xml on ingest of the camera trap data.

      Thanks.
      Beth

      Hi Beth,

      Thank you. The logical model I referred to is the one found under Shared Documents called CameraTrapsLogicalModel.docx

      It has entities like Image, but not Observation, which I would expect, since we have the Observation tables in the repository. Maybe Image has the observations in it?

      My current interest is to make individual observations selectable via risearch queries. This would require that the observations each be represented by a Fedora object with RELS-EXT relationships for each of the columns in the observation tables, and ideally these relationships would also be found in the logical model.

      So I guess my bottom line for now is, how are the observation tables included in the logical model?

      Thanks,
      Gert

            DavisDa Davis, Daniel
            DavisDa Davis, Daniel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: