FAIR data assessment tool

Use this tool to assess whether your data is 'FAIR' - Findable, Accessible, Interoperable, Reusable.

Click the information button for explanatory information and links to wider resources.

Findable

Does the dataset have any identifiers assigned?

Globally unique and persistent (e.g. DOI, PURL, ARK or Handle)

Web address (URL)

Local identifier

No identifier

Is the dataset identifier included in all metadata records/files describing the data?

Yes

How is the data described with metadata?

Comprehensively using a recognised formal machine-readable metadata schema

Comprehensively, but in a text-based, non-standard format

Brief title and description

The data is not described

What type of repository or registry is the metadata record in?

Data is in one place but discoverable through several registries

Generalist public repository

Domain-specific repository

Local institutional repository

The data is not described in any repository

Accessible

How accessible is the data?

Publicly accessible

Fully accessible to persons who meet explicitly stated conditions, e.g. approval for sensitive data

A de-identified / modified subset of the data is publicly accessible

Embargoed access after a specified dates

Unspecified conditional access e.g. contact the data custodian for access

Access to metadata only

No access to data or metadata

Is the data available online without requiring specialised protocols or tools once access has been approved?

Standard web service API (e.g. OGC)

Non-standard web service (e.g. OpenAPI/Swagger/informal API)

File download from online location

By individual arrangement

No access to data

Will the metadata record be available even if the data is no longer available?

Yes

Unsure

Interoperable

What (file) format(s) is the data available in?

In a structured, open standard, machine-readable format

In a structured, open standard, non-machine-readable format

Mostly in a proprietary format

What best describes the types of vocabularies/ontologies/tagging schemas used to define the data elements?

Standardised open and universal using resolvable global identifiers linking to explanations

Standardised vocabularies/ontologies/schema without global identifiers

No standards have been applied in the description of data elements

Data elements not described

How is the metadata linked to other data and metadata (to enhance context and clearly indicate relationships)?

Metadata is represented in a machine readable format, e.g. in a linked data format such as Resource Description Framework (RDF).

The metadata record includes URI links to related metadata, data and definitions

There are no links to other metadata

Reusable

Which of the following best describes the license/usage rights attached to the data?

Standard machine-readable license (e.g. Creative Commons)

Standard text based license

Non-standard machine-readable license (clearly indicating under what conditions the data may be reused)

Non-standard text-based license

No license

How much provenance information has been captured to facilitate data reuse?

Fully recorded in a machine readable format

Fully recorded in a text format

Partially recorded

No provenance information is recorded

Total FAIR Score

To learn more about FAIR data visit: https://www.crosslateral.com.au/blog/what-is-fair-data.html

We have the expertise to make your data FAIR. Contact us at info@crosslateral.com.au to find out more.

Disclaimer and credits

Tool Disclaimer: This FAIR data Self-Assessment Tool is provided purely for informational purposes. It is based on our interpretation of the FAIR Data Principles with the acknowledgement that there are other interpretations of the principles.
The scores arising from this tool are intended for self assessment purposes only and to trigger thinking and discussion around possible ways of making data more FAIR.

Credit: This tool is based on the ARDC FAIR data Self-Assessment Tool developed by ARDC, available at https://ardc.edu.au/resources/working-with-data/fair-data/fair-self-assessment-tool/

The source code for this tool was forked from the Github repository at https://github.com/au-research/FAIR-Data-Assessment-Tool.
The source code for the tool on this webpage is available at https://github.com/dxwell/FAIR-Data-Assessment-Tool.

About Findable

Making data Findable includes assigning a persistent identifier (like a DOI or Handle ), having rich metadata to describe the data and making sure it is findable through disciplinary and generalist discovery portals (local and international).

About Accessible

To make data accessible may include making the data open using a standardised protocol. However the data does not necessarily have to be open. There are sometimes good reasons why data cannot be made open, for example privacy concerns, national security or commercial interests. If it is not open there should be clarity and transparency around the conditions governing access and reuse

About Interoperability

To be interoperable (i.e. data that is interpretable by a computer, so that they can be automatically combined with other data) the data will need to use community agreed formats, language and vocabularies. The metadata will also need to use a community agreed standards and vocabularies, and contain links to related information using identifiers.

About Reusable

Reusable data should maintain its initial richness. For example, it should not be abridged for the purpose of explaining the findings in one particular publication. It needs a clear machine-readable licence and provenance information on how the data was formed. It should also use discipline-specific data and metadata standards to give it rich contextual information that will allow for accurate interpretation and reuse.

About identifiers

Identifiers are essential for identifying, finding, retrieving, linking and citing datasets. A Web address (URL) can be used to specify the online location of a resource but over time URLs tend to change which leads to broken links to the data. To be useful, identifiers need to be persistent and unique. Digital Object Identifiers (DOIs) and other persistent identifiers (PIDs) provide a permanent citable reference to a particular dataset. The DOI is a permanent fixed reference to the dataset no matter where it is located online and enables citation and citation metrics.

Services to create a persistent identifier are often offered by your affiliated institution or the repository you are using to describe your data. Talk to your library service for more information.

About identifiers

The identifier (preferably a persistent identifier) needs to be clearly stated in the metadata record describing the data collection, and also in any associated data files or metadata.

About metadata descriptions

Comprehensive metadata records will include descriptive content that facilitates discovery, access and reuse of the data being described. While there is no 'one size fits all' list, comprehensive metadata should include:

a globally unique persistent identifier e.g. a DOI
a title
related people, i.e. the data creator or custodian
how to access the data and file formats
a description of how the data were created and how to interpret the data subject or keywords
citation information that clearly indicates how the data should be cited
a machine-readable data licence
provenance and contextual information such as:
- links to related publications, projects, services and software
- methodology and processes involved in data production
spatial and temporal coverage (if relevant)
object-level data description

Providing metadata in a standard schema allows it to be read and used by machines as well as humans.

About metadata repositories

A rich metadata description alone does not ensure a dataset’s ‘findability’ on the internet; the dataset needs to be registered or indexed in a searchable resource such as a DCAT compliant data catalogue. Generalist data repositories include (e.g. Figshare, Dryad, Zenodo domain-specific (e.g. PANGAEA, ADA, ICPSR), or institutional repositories include (e.g. ANU Research Repository , Sydney eScholarship Repository, data.gov.au) data repository or registry. Ideally these repositories/registries are indexed by search engines such as Google and/or Google Scholar.

About data accessibility

Not all data that is discoverable can be freely accessed. Often there are embargoes, access controls, and access permissions associated with data due to a variety of issues such as privacy and commercial interests. Even with all these issues much sensitive data can be shared. Many other issues that could be perceived as blockers to sharing data may be overcome.

About protocols

Ideally users would like to retrieve appropriate internet content directly and unhindered once they have located it. Internet protocols (e.g. http and ftp) define rules and conventions for communication between devices, and tools and services are available to facilitate this process, e.g. APIs.

Protocols

HTTP (Hypertext Transfer Protocol) is the set of rules for transferring files (text, graphic images, sound, video, and other multimedia files) on the World Wide Web. As soon as a Web user opens their Web browser, the user is indirectly making use of HTTP.

FTP (File Transfer Protocol) is a standard Internet protocol for transmitting files between computers on the Internet over TCP/IP connections. Read more: Web protocols

API (Application Programming Interface): When information is made available in any machine readable format, it becomes possible to make that information directly available to programs that request that information over the web. An API is the way this information is made directly available to other machines.

View more: What is an API?

OGC - http://www.opengeospatial.org/ogc

About metadata availability

Effort is often required to maintain data resources online which can often lead to it being neglected especially over long periods of time. This often leads to broken links between the metadata and the data. Having at least a description of the data in the form of a metadata record means there is a record of the data's existence allowing the possibility of someone tracking it down.

About file formats

When selecting file formats for archiving, the formats should ideally be:

Non-proprietary
Unencrypted
Uncompressed
In common usage by the research community
Adherent to an open, documented standard, such as described by the State of California (see AB 1668, 2007)
- Interoperable among diverse platforms and applications
- Fully published and available royalty-free
- Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology
- Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.

From: Stanford University Library - Best practices for file formats

File format examples:

Structured, open standard, machine-readable format e.g. (text) PDF/A, HTML, Plain text, (images) TIFF, JPEG 2000, GIF, (audio) MP3, AIFF, WAVE, (video) MOV, MPEG, AVI, (Tabular data) CSV
structured, open standard, non-machine-readable format, e.g. PDF, HTML, JPG
Proprietary format, e.g. doc (Word), .xls (Excel), .ppt (PowerPoint), .sav

ANDS
European Data Portal

About schema standards

Schema standards are schemas that have gone through a formal validation process by a standards organisation, such as the International Standards Organisation (ISO) or an equivalent body such as the Dublin Core Metadata Initiative (DCMI); or, commonly used and consistently applied metadata schemas that are well documented, endorsed, and maintained can also become ‘de-facto standards’, e.g. RIF-CS for describing data collections and services used in Research Data Australia.

About metadata links

The goal is to create as many meaningful linkages as possible between (meta)data resources to enrich the contextual knowledge about the data, balanced against the time/energy involved in making a good data model.

Machine-readable (meta)data

(Meta)data in a format that can be automatically read and processed by a computer, such as CSV, JSON, XML etc. For more information: Open Data Handbook

Linked Data

Linked Data (also known as Linking Data) can be applied to improve the exploitation of the “Web of data.” The expression refers to the publishing of structured data in a way that typed links are created between data from different sources to provide a higher level of usability. By using Linked Data, it is possible to find other, related data. Structured data should meet four requirements to be called Linked Data:

URIs should be assigned to all entities of the dataset.
HTTP URIs are required to ensure that all entities can be referenced and cited by users and user agents.
Entities should be described using standard formats such as RDF/XML.
Links should be created to other, related entity URIs.

All data that fulfil these requirements and are released for the public are called Linked Open Data (LOD). http://www.lesliesikos.com/semantic-web-machine-readable-structured-data-with-meaningful-annotations/

Read more under the heading (meta)data include qualified references to other (meta)data in the following resource: https://www.dtls.nl/fair-data/fair-principles-explained/
For more information on Linked Data standards - RDF - https://www.w3.org/RDF/
Other related linked data standards: OWL - https://www.w3.org/OWL/ , SKOS - https://www.w3.org/2004/02/skos/ , SPARQL - https://www.w3.org/TR/rdf-sparql-query/

About licenses

If data is not licensed no-one else can use it. In Australia, no licence is regarded as the same as 'all rights reserved', confining any reuse to very limited circumstances. Applying a Creative Commons licence to your data is a simple way to ensure that your data can be reused. The less restrictive the licence, the more that can be done with the data.

To make it easy for the Web to know when a work is available under a particular license, a “machine readable” version of the license provides a summary of the key freedoms and obligations written into a format that software systems, search engines, and other kinds of technology can understand.

Example from the Creative Commons License Chooser :

What appears in the metadata:

This work is licensed under a Creative Commons Attribution 4.0 International License.

Code snippet:

            <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">
            <img alt="Creative Commons License" style="border-width:0"
            src="https://i.creativecommons.org/l/by/4.0/88x31.png"/>
            </a><br />This work is licensed under a
            <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">
            Creative Commons Attribution 4.0 International License</a>.

Read more: https://www.ands.org.au/working-with-data/publishing-and-reusing-data/licensing-for-reuse

About provenance

Data provenance is used to document where a piece of data comes from and the process and methodology by which it is produced. It is important to confirm the authenticity of data enabling trust, credibility and reproducibility. This is becoming increasingly important, especially in the eScience community where research is data intensive and often involves complex data transformations and procedures. Read more… Provenance vocabularies such as Provenir Ontology (PROV-O) and others

About Findable Score

The score was derived as follows:

Does the dataset have any identifiers assigned:
You answered which gave you out of a possible 8 points.

Is the dataset identifier included in all mdetadata records:
You answered which gave you out of a possible 1 point.

How is the data described with metadata:
You answered which gave you out of a possible 4 points.

What type of repository or registry is the metadata record in:
You answered which gave you out of a possible 4 points.

This gives you out of a possible 17 points.

About Accessible Score

The score was derived as follows:

How accessible is the data:
You answered which gave you out of a possible 5 points.

Is the data available online without requiring specialised protocols or tools once access has been approved:
You answered which gave you out of a possible 4 point.

Will the metadata record be available even if the data is no longer available:
You answered which gave you out of a possible 1 points.

This gives you out of a possible 10 points.

About Interoperable Score

The score was derived as follows:

What (file) format(s) is the data available in:
You answered which gave you out of a possible 2 points.

What best describes the types of vocabularies/ontologies/tagging schemas used to define the data elements:
You answered which gave you out of a possible 3 point.

How is the metadata linked to other data and metadata to provide context around the data:
You answered which gave you out of a possible 3 points.

This gives you out of a possible 8 points.

About Reusable Score

The score was derived as follows:

Does the dataset have any identifiers assigned?
You answered which gave you out of a possible 4 points.

Is the dataset identifier included in all mdetadata records:
You answered which gave you out of a possible 2 points.

This gives you out of a possible 6 points.