Use this tool to assess whether your data is 'FAIR' - Findable, Accessible, Interoperable, Reusable.
Click the information button for explanatory
information and links to wider resources.
To learn more about FAIR data visit: https://www.crosslateral.com.au/blog/what-is-fair-data.html
We have the expertise to make your data FAIR. Contact us at info@crosslateral.com.au to find out more.
Making data Findable includes assigning a persistent identifier (like a DOI or Handle ), having rich metadata to describe the data and making sure it is findable through disciplinary and generalist discovery portals (local and international).
To make data accessible may include making the data open using a standardised protocol. However the data does not necessarily have to be open. There are sometimes good reasons why data cannot be made open, for example privacy concerns, national security or commercial interests. If it is not open there should be clarity and transparency around the conditions governing access and reuse
Identifiers are essential for identifying, finding, retrieving, linking and citing datasets. A Web address (URL) can be used to specify the online location of a resource but over time URLs tend to change which leads to broken links to the data. To be useful, identifiers need to be persistent and unique. Digital Object Identifiers (DOIs) and other persistent identifiers (PIDs) provide a permanent citable reference to a particular dataset. The DOI is a permanent fixed reference to the dataset no matter where it is located online and enables citation and citation metrics.
Services to create a persistent identifier are often offered by your affiliated institution or the repository you are using to describe your data. Talk to your library service for more information.
Read more: Persistent Identifiers
The identifier (preferably a persistent identifier) needs to be clearly stated in the metadata record describing the data collection, and also in any associated data files or metadata.
Comprehensive metadata records will include descriptive content that facilitates discovery, access and reuse of the data being described. While there is no 'one size fits all' list, comprehensive metadata should include:
Read more: Metadata
A rich metadata description alone does not ensure a dataset’s ‘findability’ on the internet; the dataset needs to be registered or indexed in a searchable resource such as a DCAT compliant data catalogue. Generalist data repositories include (e.g. Figshare, Dryad, Zenodo domain-specific (e.g. PANGAEA, ADA, ICPSR), or institutional repositories include (e.g. ANU Research Repository , Sydney eScholarship Repository, data.gov.au) data repository or registry. Ideally these repositories/registries are indexed by search engines such as Google and/or Google Scholar.
Ideally users would like to retrieve appropriate internet content directly and unhindered once they have located it. Internet protocols (e.g. http and ftp) define rules and conventions for communication between devices, and tools and services are available to facilitate this process, e.g. APIs.
Protocols
HTTP (Hypertext Transfer Protocol) is the set of rules for transferring files (text, graphic images, sound, video, and other multimedia files) on the World Wide Web. As soon as a Web user opens their Web browser, the user is indirectly making use of HTTP.
FTP (File Transfer Protocol) is a standard Internet protocol for transmitting files between computers on the Internet over TCP/IP connections. Read more: Web protocolsAPI (Application Programming Interface): When information is made available in any machine readable format, it becomes possible to make that information directly available to programs that request that information over the web. An API is the way this information is made directly available to other machines.
View more: What is an API?
Effort is often required to maintain data resources online which can often lead to it being neglected especially over long periods of time. This often leads to broken links between the metadata and the data. Having at least a description of the data in the form of a metadata record means there is a record of the data's existence allowing the possibility of someone tracking it down.
When selecting file formats for archiving, the formats should ideally be:
From: Stanford University Library - Best practices for file formats
File format examples:
Schema standards are schemas that have gone through a formal validation process by a standards organisation, such as the International Standards Organisation (ISO) or an equivalent body such as the Dublin Core Metadata Initiative (DCMI); or, commonly used and consistently applied metadata schemas that are well documented, endorsed, and maintained can also become ‘de-facto standards’, e.g. RIF-CS for describing data collections and services used in Research Data Australia.
Read more: Metadata schemaStandardised open and universal schemas, e.g, Data Catalog Vocabulary (DCAT), DataCite Metadata Schema, PROV, Dublin Core
Domain-specific standards, e.g. . HPO - Human Phenotype Ontology, MeSH - Medical Subject Headings , Marine Community Profile, DDI - Data Documentation Initiative For more examples of standards...
Resolvable global identifiers - An identifier is any label used to name something uniquely (whether online or offline). URLs are an example of an identifier. So are serial numbers, and personal names. A persistent identifier is guaranteed to be managed and kept up to date over a defined time period.
Read more: Persistent Identifiers
The goal is to create as many meaningful linkages as possible between (meta)data resources to enrich the contextual knowledge about the data, balanced against the time/energy involved in making a good data model.
Read more under the heading (meta)data include qualified references to other (meta)data
in
the following resource:
https://www.dtls.nl/fair-data/fair-principles-explained/
For more information on Linked Data standards -
RDF - https://www.w3.org/RDF/
Other related linked data standards:
OWL - https://www.w3.org/OWL/ , SKOS - https://www.w3.org/2004/02/skos/ , SPARQL -
https://www.w3.org/TR/rdf-sparql-query/
If data is not licensed no-one else can use it. In Australia, no licence is regarded as the same as 'all rights reserved', confining any reuse to very limited circumstances. Applying a Creative Commons licence to your data is a simple way to ensure that your data can be reused. The less restrictive the licence, the more that can be done with the data.
To make it easy for the Web to know when a work is available under a particular license, a “machine readable” version of the license provides a summary of the key freedoms and obligations written into a format that software systems, search engines, and other kinds of technology can understand.
Example from the Creative Commons License Chooser :
What appears in the metadata:
This work is licensed under a
Creative Commons Attribution 4.0 International License.
Code snippet:
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"> <img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png"/> </a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/"> Creative Commons Attribution 4.0 International License</a>.Read more: https://www.ands.org.au/working-with-data/publishing-and-reusing-data/licensing-for-reuse
Data provenance is used to document where a piece of data comes from and the process and methodology by which it is produced. It is important to confirm the authenticity of data enabling trust, credibility and reproducibility. This is becoming increasingly important, especially in the eScience community where research is data intensive and often involves complex data transformations and procedures. Read more… Provenance vocabularies such as Provenir Ontology (PROV-O) and others
The score was derived as follows:
Does the dataset have any identifiers assigned:
Is the dataset identifier included in all mdetadata records:
You answered which gave you out
of
a
possible 1 point.
How is the data described with metadata:
You answered which gave you out
of
a
possible 4 points.
What type of repository or registry is the metadata record in:
You answered which gave you out
of
a
possible 4 points.
This gives you out of a possible 17 points.
The score was derived as follows:
How accessible is the data:
You answered which gave you out
of
a
possible 5 points.
Is the data available online without requiring specialised protocols or tools once
access
has
been approved:
You answered which gave you out
of
a
possible 4 point.
Will the metadata record be available even if the data is no longer available:
You answered which gave you out
of
a
possible 1 points.
This gives you out of a possible 10 points.
The score was derived as follows:
What (file) format(s) is the data available in:
What best describes the types of vocabularies/ontologies/tagging schemas used to define
the data elements:
You answered which gave you out
of
a
possible 3 point.
How is the metadata linked to other data and metadata to provide context around the
data:
You answered which gave you
out
of a
possible 3 points.
This gives you out of a possible 8 points.
The score was derived as follows:
Does the dataset have any identifiers assigned?
You answered which gave you
out
of a
possible 4 points.
Is the dataset identifier included in all mdetadata records:
You answered which gave you
out
of a
possible 2 points.
This gives you out of a possible 6 points.