EBI Onboarding document

Welcome to the EBI HCA data wrangling & metadata team! Here, you will find some guidelines for your onboarding process.

Table of contents

Where to find general information

About the EBI

The EBI Intranet is a great resource and the search function is pretty good at finding relevant info. There is also a dedicated webpage for newcomers with lots of useful information that you might not yet even know you need to know.

EBI Human Cell Atlas team processes and procedures

Our main source for general info about our processes at the EBI branch of the HCA is the confluence website, take all information found here with a grain of salt because much of it is outdated. You will need editing privileges to the wiki in order to fill out your Weekly Activity Report so if you don’t see an edit button at the top right of the page then ask Oihane to give you the appropriate access. It is a good idea to have a brief look over the various pages that exist on confluence as you will most likely need to refer back to them later.

In particular take note of the following pages:

If anything isn’t clear, please ask Gabs or Oihane.

About the Human Cell Atlas

The HCA white paper gives a great view of the ethos of the project but is light on specifics on what will be done.

Feedback

This is a living document we want to make it better with every person that joins the team. If you have any ideas for improvement, for example if you can’t find certain topics that you think should be here, please write them down and report back as to improve this documentation.

Wranglers currently working at the HCA DCP

Name Institution E-mail
Enrique Sapena Ventura EBI enrique@ebi.ac.uk
Wei Kheng Teh EBI wteh@ebi.ac.uk
Ida Zucchi EBI idazucchi@ebi.ac.uk
Arsenios Chatzigeorgiou EBI arsenios@ebi.ac.uk
Parisa Nejad UCSC pnejad@ucsc.edu
William Sullivan UCSC wisulliv@ucsc.edu

General suggestions

First day

  • Send your github username to Amnon so that you can get access to our git organisation
    • You will need to configure/add 2-factor authentication to your Github account. Information about how to do this can be found here
  • Set up your computer. Do ask any of the wrangler team if you encounter any issues.
  • Set up your favourite web browser. It will make more sense later, but the usual “go-to” for the wranglers is Google Chrome, as some of the tasks that you’re going to do require some plug-ins.
  • For Mac Users: Look at the “Managed Software Centre” app thoroughly. Install every app you think you will need, asking for help when necessary.
  • You should install:
    • E-mail manager app. People use either the default installed by your operating system or thunderbird, though any e-mail manager should be fine (Except for Gmail). Information about setting up your email for EBI users can be found on the intranet.
    • Notes app. You can use either google keep, Evernote (Pretty useful, and it’s on the recommended list), or your choice of preference. Make sure you record every single thing that you do while working (Doesn’t have to be long, just a short a concise job description). It may seem like a major inconvenience at the beginning, but it will come in very handy when you have to fill your timesheets, especially if you have to dedicate percentages of time to different grants.
    • Slack. It is highly recommended to set it up on your desktop rather than using the web app. It is optional whether you want to install the app on your phone.
  • Make sure you have all the necessary cables/adaptors to set up your mac with your monitor, with an appropriate keyboard and mouse.
  • Make sure you have access to:
  • Since your e-mail was probably created some weeks before your actual starting day, you will probably have 100+ e-mails. Most of them won’t have any interest to you (Mostly the older ones) but try to at least read the title before archiving/deleting them (And try to not delete any important e-mail. They should be archived, as they might be useful to revisit in the future)
  • At reception, they will give you some documents to fill in.
  • You will most likely have a meeting with your Human Resources (HR) officer where you will be provided with more forms and information. They will also guide you through your first steps. Don’t hesitate to ask them, they are here to help.
  • The bus timetables and routes can be found here.

First week

  • Make sure you check the calendar every day. There might be meetings you don’t want to miss. This is a general thing to do, but it will probably only take you a couple days to get used to.
  • The first week, your PL (Project Leader) will guide you through several meetings.
  • General meetings you will have are:
    • Introductory meeting with the PL
    • Expectations meeting
    • Induction meeting(s) with HR and other teams. They will help you set up and give you guidelines about the documentation you need to fill in and send once you get here. Take good note, as there will be plenty!
    • Try to bookmark, at least, confluence, SAP, the calendar, the git repositories that you use the most and the intranet. It comes in very handy.
  • Get used to git. Get ready to clone, pull, push, commit, and read the docs. We have really useful guides in there!

Each day

  • Keep a good record of what you are doing, how many hours and, in case you have more than one cost centre, try to prioritize

Each week:

  • Fill in your timesheet (SAP)
  • Fill in the WAR (Confluence) - each Monday for the previous week of work

Specifics

Calendar

You can either set it up on your favourite calendar app or use google calendar. Most of the wranglers use google calendar, but it’s your decision to make. Thunderbird has a nifty Google Calendar extension which might be of interest. Once set up, you will notice that the events represented inside have different colours which are customisable by you. The Calendars that you should have access to are:

  • HCA google calendar: This is where anyone working on the HCA at EBI records events that may affect the EBI team, they may or may not be relevant to wranglers. This calendar is managed by Oihane
  • DCP Meeting calendar: This is where DCP-wide meetings are scheduled, not all meetings are relevant to wranglers. It includes meetings that could be across multiple sites UCSC/Broad/EBI/CZI/Stanford
  • AIT Calendar: This calendar is where the AIT team record AIT group meetings, vacation and sick leave, work from home (WFH) days and training.

If you can’t see any of the above calendars, request access from Amnon

If you need to create an event (e.g., to book a meeting with someone), you can left-click on the appropriate time (It goes by 30-minute periods), fill out the event details (title names should be short and concise, if you need to explain further details you can do so in the event description), and click on “more options” if you need to invite a guest.

You can view the calendar from other people of the team, unless they make it private. This is useful when planning meetings and trying to find a time that suits everyone.

Regular Meetings

Key meetings that wranglers are summarised in the table below: See chairing rota

Meeting Day Time Frequency Notes
Operations planning/review Wednesday 9.30am Weekly  
Dev sprint kick off meeting Wednesday 10.30am Bi-weekly At start of EBI sprint, only wrangler on development needs to attend
EBI-DCP Sprint Planning & Demo Tuesday 2pm Bi-weekly 1 hour, telecon, at end of a sprint
EBI DCP Sprint Retro Tuesday 4pm Bi-weekly 1 hour, telecon
DCP Sprint Demo Tuesday 4pm Bi-weekly At end of DCP sprint (see DCP calendar)
AIT Team meetings Thursday 3pm Bi-weekly In person, alternates between dev and general interest

HCA-wide Meetings

Scientific meetings are held a few times a year and are a great opportunity to connect directly with researchers involved in the HCA. However, the tickets for these events are limited so Wranglers aren’t guaranteed a spot. Project leads do their best to advocate for Wranglers attending so check on whether you are likely to attend the next one.

Slack

We use slack a lot. If you haven’t already been invited to these two slack instances, please tell another member of the team or Gabs/Oihane and it can be easily sorted. There may also be other slack instances that it is useful to join but you will be made aware of those as and when it is relevant.

HCA: https://join-hca-slack.data.humancellatlas.org/ AIT: https://tsc.ebi.ac.uk/article/slack-getting-started

Most slack is through the AIT slack

Useful channels include:

#hca - this is where most of the general communication about HCA takes place as well as announcements, reminders of stand up etc

#hca-operations - channel for discussions of operations tasks and submitting datasets to the DCP

#hca-development - channel for discussion about the current development work, sorting testing, answering questions etc

#hca-wrangler-metadata - this is where most of the discussion relating to wrangling and the HCA metadata are discussed

#dcp-2 - This is a shared channel with the HCA slack. It is used for general announcements about the DCP or questions about the DCP, this is a good place to introduce yourself when you have logged in.

#dcp-ops - This is a shared channel with the HCA slack. It is used for operation discussion/announcements, including when releases are available for review or they go live.

In the HCA slack, the open channels you will find useful are - although much less chat happens on these channels now.

#data-wrangling - general communication about wrangling and status of services associated with wrangling

#hca-metadata - the channel where metadata meeting announcements are posted and metadata discussion happens

#papers - Cell atlas relevant papers get posted here

You will need to ask Gabs to invite you to some private channels, these are different subsets of our team and collaborators to enable private communication.

Data-wrangling-int ebi-wrangler-metadata

This is a current list of useful channels. Please always ask if a channel you are a member of is very quiet and you aren’t sure if you are missing anything or if you aren’t sure if there is a channel for a particular topic that you should join.

SAP

For EMBL-EBI Wranglers

SAP contains several important services, such as timesheets, leave records and some other useful administration requests. Your access to SAP might not be immediate. Request access to your PL and they will talk you through the process.

Timesheets

We are required to record our time working at the EBI in timesheets. A couple of clarifications:

  • If you don’t have access to your cost centres (Only EBI time budget displays, or it doesn’t display the correct one(s)), ask Gabs or Oihane to follow up. If you don’t know which cost centres you should use for your timesheet, don’t hesitate to ask your PL.
  • In order to fill your timesheet, you need to have a row per cost centre per day. If you do work for more than one cost centre in one day you will need to add row so don’t be shy to click the “add row” button if necessary!

Tips:

  • Try to be short with the description of the work done (As you have to fit it in one line),
  • Balance your time so every week you have the percentages (According to the grants related to your job) right.
  • Try to remember to submit your timesheet each week as a minimum, or strive for daily to ensure you’re always recording your time accurately while it’s fresh in your mind
  • The hard deadline for submitting a timesheet is by the last day of each calendar month to ensure your supervisor has enough time to review and approve

Google Docs and Drive

You have unlimited storage in Google Drive. Use it for storing all the documents that you create / all the information from the datasets that you wrangle and try to maintain it in a structured hierarchy. You will go through this on an onboard meeting.

Google Docs has most of the important documents that are used by the team on a regular basis. We don’t expect you to know all of them the first day, so ask for a link to the document if you can’t find it. Searching is useful for finding documents (if you know what you’re looking for) but if there’s anything you can’t find do ask.

A good place to start is by checking out the Brokering folder (search is your friend) which is mainly where we store documents related to our ongoing wrangling tasks. The spreadsheets are in: PROJECTS - [IN PROGRESS|FINISHED], etc

E-mail

You should be able to access your email straight away via webmail with the username and password given to you by your PL.

If you would also like to access your email using a desktop app you can follow the instructions here.

Your email should be checked throughout every working day. Part of your job as a wrangler consists of providing the data contributors all the help they might need. That being said, that doesn’t mean that you should leave everything you’re doing to answer an e-mail. Check the priorities and answer in an appropriate amount of time.

Central HCA email lists

Check your google groups and ask Oihane to add you if you are not a member of any of the following groups:

  • DCP Metadata Team
  • DCP Ingest Team
  • DCP Wrangler Team
  • metadata-community

These are mailing lists that are used by the DCP to reach the metadata and wrangling teams, respectively. The DCP Wrangler Team email (wrangler-team@data.humancellatlas.org) is also used by data contributors for help during the submission process.

The advantage of this system is that the history of the email address is all readily available and searchable on the relevant google group page: https://groups.google.com/a/data.humancellatlas.org

EBI/Sanger email lists

Additionally, make sure to send an email to ITSupport@ebi.ac.uk (cc’ing Gabs) and ask to be added to the ait-hca email list.

VPN

  • In order to be able to use most of the EBI services when you’re off-site, you will have to set up a VPN connection. You’ll have to set a 2FA (Two Factor Authentication) to be able to connect. A useful guide to do so can be found here. The easiest way is with a Google Authenticator compliant app, of your preference between the ones listed on the webpage.

Computing resources

Amazon cloud services

We have access to Amazon cloud services and mainly use them in our day-to-day for facilitating data upload.

s3 buckets are created by wranglers and used to allow contributors to upload data into an accessible location. They are also used to upload data into the ingest system. We primarily use hca-util for administering these areas but you may also need to use the aws cli for some operations.

We also use an EC2 instance for performing some computing actions in the cloud. Please ask a developer to help you get access to this. It will involve setting up a set of ssh keys.

The address of the EC2 is tool.archive.data.humancellatlas.org

EBI compute resources

We also have access to various EBI compute resources including a cluster.

Github

The information here is general. You will go through all the specific details on the onboard meetings.

Setup

  • Before beginning, you have to set up 2FOA security for your GitHub user. You can find the instructions on how to do so here.
  • Install Git secrets
    • ensure you also add the AWS patterns to your global configuration git secrets --register-aws --global
  • Install the pre-commit python package that is to be used only within the metadata-schema repo.
  • It also makes things easier if you set up an SSH key. GitHub’s guide for adding an SSH key.

General tips:

  • Get comfortable creating GitHub issues to discuss about relevant topics.
  • When creating a new branch to make changes, always pull the latest changes from develop before checking out. This will be the branch you will be mostly changing, and you want to keep up-to-date to it.

The Wrangler’s most used repos

Repositories you will mainly use:

  • HumanCellAtlas/metadata-schema: This is where the schema and most of the documentation for making changes to these schemas is stored. You should read through the documents, but to get you started, you should look at commiters.md and structure.md. This will shed some light for you on your first tasks.
  • Ask to be added to the metadata-updates HCA GitHub team. Only members of this team will be allowed to commit to the metadata-schema repo.
  • ebi-ait/hca-ebi-wrangler-central: Main repo for tracking all general wrangler tasks. Read the docs, especially the ones in the SOPs (Standard Operating Procedures) folder.

Other repos to be aware of

Useful Bookmarks

General

Wrangling

Browser plug-ins

  • modHeader: Navigate ingest-api via browser, when authorisation token is required.

    Tip: Add “Request URL Filter”: .*://api.ingest.archive.data.humancellatlas.org/.* to avoid sending header to any website.

  • Octotree: Adds easier navigation of repos in github
  • JSON viewer of your choice.
  • Zenhub: Zenhub access has been expired, and we are no longer using it.

Miscellanea

  • Printers: If you install the correspondent apps (You can search for them as “Printer”, under the “Managed software centre” app), you will be able to use the Konica printers in both buildings. Using the printers has a small cost which comes out of the group’s budget.
  • Campus wifi: The mobile phone coverage on campus is quite patchy so you may wish to connect to the campus wifi. There is a Guest wifi (WGCGuest) as well as eduroam. To access eduroam you will need to first register. There are instructions here