scrna3/6 Jupyter Notebook lamindata

Query artifacts#

Here, we鈥檒l query artifacts and inspect their metadata.

This guide can be skipped if you are only interested in how to leverage the overall collection.

import lamindb as ln
import bionty as bt
馃挕 connected lamindb: testuser1/test-scrna
ln.settings.transform.stem_uid = "agayZTonayqA"
ln.settings.transform.version = "1"
ln.track()
馃挕 notebook imports: bionty==0.42.9 lamindb==0.70.3
馃挕 saved: Transform(uid='agayZTonayqA5zKv', name='Query artifacts', key='scrna3', version='1', type='notebook', updated_at=2024-04-22 10:28:14 UTC, created_by_id=1)
馃挕 saved: Run(uid='LFmPPkLDY3LdtU4eGZbJ', transform_id=3, created_by_id=1)

Query artifacts by provenance metadata#

users = ln.User.lookup()
ln.Transform.filter(created_by=users.testuser1).search("scrna")
uid score
name
scRNA-seq Nv48yAceNSh85zKv 90.0
Standardize and append a batch of data ManDYgmftZ8C5zKv 45.0
Query artifacts agayZTonayqA5zKv 36.0
transform = ln.Transform.filter(uid="Nv48yAceNSh85zKv").one()
ln.Artifact.filter(transform=transform).df()
uid storage_id key suffix accessor description version size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual created_at updated_at created_by_id
id
1 NMz0UjANM5fCae2X70k8 1 None .h5ad AnnData Human immune cells from Conde22 None 57612943 9sXda5E7BYiVoDOQkTC0KB sha1-fl None 1648 1 1 1 True 2024-04-22 10:27:37.754373+00:00 2024-04-22 10:27:41.115168+00:00 1

Query artifacts by biological metadata#

organism = bt.Organism.lookup()
tissues = bt.Tissue.lookup()
query = ln.Artifact.filter(
    organism=organism.human,
    tissues=tissues.bone_marrow,
)
query.df()
uid key suffix accessor description version size hash hash_type n_objects n_observations visibility key_is_virtual created_at updated_at storage_id transform_id run_id created_by_id
id

Inspect artifact metadata#

query_set = ln.Artifact.filter().all()

artifact1, artifact2 = query_set[0], query_set[1]
artifact1.describe()
Artifact(uid='NMz0UjANM5fCae2X70k8', suffix='.h5ad', accessor='AnnData', description='Human immune cells from Conde22', size=57612943, hash='9sXda5E7BYiVoDOQkTC0KB', hash_type='sha1-fl', n_observations=1648, visibility=1, key_is_virtual=True, updated_at=2024-04-22 10:27:41 UTC)

Provenance:
  馃搸 storage: Storage(uid='R1FszoUJ', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local')
  馃搸 transform: Transform(uid='Nv48yAceNSh85zKv', name='scRNA-seq', key='scrna', version='1', type='notebook')
  馃搸 run: Run(uid='8Vvyad731ZW2UD3D2laJ', started_at=2024-04-22 10:25:44 UTC, is_consecutive=True)
  馃搸 created_by: User(uid='DzTjkKse', handle='testuser1', name='Test User1')
  馃搸 input_of (core.Run): ['2024-04-22 10:27:48 UTC']
Features:
  var: FeatureSet(uid='xrl2iYiGVIJt8zCjfADA', n=36503, type='number', registry='bionty.Gene')
    'CPOX', 'MEIS1-AS3', 'KRTAP10-6', 'SMPX', 'KRTAP19-3', 'SALL2', 'IGKV2OR2-10', 'SCYGR5', 'C1QL1', 'ZNF224', 'TRMT2A', 'MAK', 'SMYD4', 'VWDE', 'PPFIA4', 'OR13H1', 'ASTN2-AS1', 'DAW1', 'MIR4307HG', 'PRB1', ...
  obs: FeatureSet(uid='fEx9tSlrTiNzJQFcK9WB', n=4, registry='core.Feature')
    馃敆 donor (12, core.ULabel): 'A36', 'A29', 'A31', 'D496', 'A37', '640C', '621B', 'A35', '582C', '637C', ...
    馃敆 tissue (17, bionty.Tissue): 'skeletal muscle tissue', 'mesenteric lymph node', 'blood', 'spleen', 'jejunal epithelium', 'caecum', 'duodenum', 'thoracic lymph node', 'ileum', 'thymus', ...
    馃敆 cell_type (32, bionty.CellType): 'gamma-delta T cell', 'alpha-beta T cell', 'plasmacytoid dendritic cell', 'germinal center B cell', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'effector memory CD4-positive, alpha-beta T cell', 'CD16-negative, CD56-bright natural killer cell, human', 'macrophage', 'plasmablast', 'memory B cell', ...
    馃敆 assay (3, bionty.ExperimentalFactor): '10x 3' v3', '10x 5' v2', '10x 5' v1'
Labels:
  馃搸 tissues (17, bionty.Tissue): 'skeletal muscle tissue', 'mesenteric lymph node', 'blood', 'spleen', 'jejunal epithelium', 'caecum', 'duodenum', 'thoracic lymph node', 'ileum', 'thymus', ...
  馃搸 cell_types (32, bionty.CellType): 'gamma-delta T cell', 'alpha-beta T cell', 'plasmacytoid dendritic cell', 'germinal center B cell', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'effector memory CD4-positive, alpha-beta T cell', 'CD16-negative, CD56-bright natural killer cell, human', 'macrophage', 'plasmablast', 'memory B cell', ...
  馃搸 experimental_factors (3, bionty.ExperimentalFactor): '10x 3' v3', '10x 5' v2', '10x 5' v1'
  馃搸 ulabels (12, core.ULabel): 'A36', 'A29', 'A31', 'D496', 'A37', '640C', '621B', 'A35', '582C', '637C', ...
artifact1.view_lineage()
_images/b77dfc194bc6fc2a0601fc04ba47d200567f0e2e4cb380b699ee558ecea28ec2.svg
artifact2.describe()
Artifact(uid='hiRXb8dAioaeadeYOpew', suffix='.h5ad', accessor='AnnData', description='10x reference adata', size=857752, hash='0Fozmib89XWbFoD6hSq5yA', hash_type='md5', n_observations=70, visibility=1, key_is_virtual=True, updated_at=2024-04-22 10:28:06 UTC)

Provenance:
  馃搸 storage: Storage(uid='R1FszoUJ', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local')
  馃搸 transform: Transform(uid='ManDYgmftZ8C5zKv', name='Standardize and append a batch of data', key='scrna2', version='1', type='notebook')
  馃搸 run: Run(uid='inEpPmUqWobUAS5M0i8r', started_at=2024-04-22 10:27:48 UTC, is_consecutive=True)
  馃搸 created_by: User(uid='DzTjkKse', handle='testuser1', name='Test User1')
Features:
  var: FeatureSet(uid='z8tmyDKwTgzdIDhPeyvw', n=754, type='number', registry='bionty.Gene')
    'EFHD2', 'RAB7A', 'S100A8', 'FAM30A', 'CD3G', 'POLD4', 'COX14', 'XCL1', 'ANXA1', 'DUSP2', 'SNHG32', 'CD247', 'NEDD8', 'UQCRC1', 'SP110', 'CD160', 'SH2D2A', 'SNORD3B-2', 'SPINT2', 'CST3', ...
  obs: FeatureSet(uid='uEZ4HV0EtL0ZDkzfphLU', n=1, registry='core.Feature')
    馃敆 cell_type (9, bionty.CellType): 'CD38-positive naive B cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD16-positive, CD56-dim natural killer cell, human', 'cytotoxic T cell', 'B cell, CD19-positive', 'CD14-positive, CD16-negative classical monocyte', 'CD4-positive, alpha-beta T cell', 'dendritic cell'
Labels:
  馃搸 cell_types (9, bionty.CellType): 'CD38-positive naive B cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD16-positive, CD56-dim natural killer cell, human', 'cytotoxic T cell', 'B cell, CD19-positive', 'CD14-positive, CD16-negative classical monocyte', 'CD4-positive, alpha-beta T cell', 'dendritic cell'
artifact2.view_lineage()
_images/1a50c658b7a1dd5a66519ef2ad5f6a5622caae77f2cb668279049b8cff1a0486.svg

Compare features#

Here we compute shared genes:

artifact1_genes = artifact1.features["var"]
artifact2_genes = artifact2.features["var"]

shared_genes = artifact1_genes & artifact2_genes
len(shared_genes)
749
shared_genes.list("symbol")[:10]
['HES4',
 'TNFRSF4',
 'SSU72',
 'PARK7',
 'RBP7',
 'SRM',
 'MAD2L2',
 'AGTRAP',
 'TNFRSF1B',
 'EFHD2']

Compare cell types#

artifact1_celltypes = artifact1.cell_types.all()
artifact2_celltypes = artifact2.cell_types.all()

shared_celltypes = artifact1_celltypes & artifact2_celltypes
shared_celltypes_names = shared_celltypes.list("name")
shared_celltypes_names
['CD16-positive, CD56-dim natural killer cell, human']

Load the individual artifacts#

We could either load the artifacts into memory or access them in backed mode through .backed() to lazily load their content.

Let鈥檚 load them into memory:

adata1 = artifact1.load()
adata2 = artifact2.load()

We can now subset the two collections by shared cell types:

adata1_subset = adata1[adata1.obs["cell_type"].isin(shared_celltypes_names)]
adata2_subset = adata2[adata2.obs["cell_type"].isin(shared_celltypes_names)]