Snapshot Package

Snapshot

class factiva.news.snapshot.Snapshot(user_key=None, user_stats=False, query=None, snapshot_id=None)

Bases: BulkNewsBase

Represent a Factiva Snapshot Class.

Class that represents a Factiva Snapshot.

Parameters
  • user_key (str or UserKey) – String containing the 32-character long APi Key. If not provided, the constructor will try to obtain its value from the FACTIVA_USERKEY environment variable.

  • user_stats (boolean, optional (Default: False)) – Indicates if user data has to be pulled from the server. This operation fills account detail properties along with maximum, used and remaining values. It may take several seconds to complete.

  • query (str or SnapshotQuery, optional) – Query used to run any of the Snapshot-related operations. If a str is provided, a simple query with a where clause is created. If other query fields are required, either provide the SnapshotQuery object at creation, or set the appropriate object values after creation. This parameter is not compatible with snapshot_id.

  • snapshot_id (str, optional) – String containing the 10-character long Snapshot ID. This parameter is not compatible with query.

See also

Stream

Class that represents the continuous Factiva News document stream.

Examples

Creating a new Snapshot with an key string and a Where statement. Then, running a full Explain process.
>>> from factiva.news.snapshot import Snapshot
>>> my_key = "abcd1234abcd1234abcd1234abcd1234"
>>> my_query = "publication_datetime >= '2020-01-01 00:00:00' AND LOWER(language_code) = 'en'"
>>> my_snapshot = Snapshot(user_key=my_key, query=my_query)
>>> my_snapshot.process_explain()
106535
Creating a new Snapshot from an UserKey and a SnapshotQuery instances. Then, running a full Analytics process.
>>> my_user = UserKey()
>>> my_query = SnapshotQuery("publication_datetime >= '2020-01-01 00:00:00' AND LOWER(language_code) = 'en'")
>>> my_query.frequency = 'YEAR'
>>> my_query.group_by_source_code = True
>>> my_query.top = 20
>>> my_snapshot = Snapshot(user_key=my_user, query=my_query)
>>> analytics_df = my_snapshot.process_analytics()
>>> analytics_df.head()
      count  publication_datetime  source_code
    0   20921   1995    NGCIOS
    1   20371   1995    LATAM
    2   18303   1995    REUTES
    3   10593   1995    EXPNSI
    4   4212    1995    MUNDO
download_extraction_files(download_path=None)

Download the list of files listed in the Snapshot.last_extraction_job.files.

Downloads the list of files listed in the Snapshot.last_extraction_job.files property, and stores them in a folder indicated by download_path. If no download_path is provided, then files are stored in a folder with the same name as the snapshot ID.

Parameters

download_path (str, optional) – String containing the file path on where to store the files. If not provided, files are stored in a folder with the same name as the update ID.

Returns

Boolean

Return type

True if the files were correctly downloaded, False otherwise.

download_update_files(download_path=None)

Download the list of files listed in the Snapshot.last_update_job.files property.

Download the list of files listed in the Snapshot.last_update_job.files property, and stores them in a folder indicated by download_path. If no download_path is provided, then files are stored in a folder with the same name as the update ID.

Parameters

download_path (str, optional) – String containing the file path on where to store the files. If not provided, files are stored in a folder with the same name as the update ID.

Raises

- RuntimeError when an update job has not been submitted.

Returns

Boolean

Return type

True if the files were correctly downloaded, False otherwise.

file_format = ''
file_list = []
folder_path = ''
get_analytics_job_results()

Obtain the Analytics job results from the Factiva Snapshots API.

Obtains the Analytics job results from the Factiva Snapshots API. Results are stored in the last_analytics_job class property.

Returns

Boolean – otherwise.

Return type

True if the data was retrieved successfully. An Exception

get_explain_job_results()

Obtain the Explain job results from the Factiva Snapshots API.

Obtains the Explain job results from the Factiva Snapshots API. Results are stored in the last_explain_job class property.

Returns

Boolean – otherwise.

Return type

True if the data was retrieved successfully. An Exception

get_explain_job_samples(num_samples=10)

Obtain the Explain job samples from the Factiva Snapshots API.

Returns a list of up to 100 sample documents (no full-text) which includes title and metadata fields.

Parameters

num_samples (int, optional (Default: 10)) – Number of sample documents to get explained by a job

Returns

Boolean – otherwise.

Return type

True if the data was retrieved successfully. An Exception

get_extraction_job_results()

Obtain the Extraction job results from the Factiva Snapshots API.

Obtains the Extraction job results from the Factiva Snapshots API. Results are stored in the last_extraction_job class property.

Returns

Boolean – otherwise.

Return type

True if the data was retrieved successfully. An Exception

get_update_job_results()

Obtain of the Update Job results from the Factiva Snapshots API.

Obtains the Update Job results from the Factiva Snapshots API. Results are stored in the last_update_job class property.

Raises

- RuntimeError when an update job has not beed submitted.

Returns

Boolean – otherwise.

Return type

True if the data was retrieved successfully. An Exception

last_analytics_job = None
last_explain_job = None
last_extraction_job = None
last_update_job = None
news_data = None
process_analytics()

Submit an Analytics job to the Factiva Snapshots API.

Submits an Analytics job to the Factiva Snapshots API, using the same parameters used by submit_analytics_job. Then, monitors the job until its status change to JOB_STATE_DONE. Finally, retrieves and stores the results in the property last_analytics_job.

Returns

Boolean – otherwise.

Return type

True if the analytics processing was successful. An Exception

Examples

Process analytics job
>>> query_clause = "publication_datetime >= '2018-01-01 00:00:00' AND publication_datetime <= '2018-01-02 00:00:00' AND LOWER(language_code) = 'en'"
>>> my_snapshot = Snapshot(user_key='abcd1234abcd1234abcd1234abcd1234', query=query_clause)
>>> my_snapshot.process_analytics()
>>> print(my_snapshot.last_analytics_job.data)
    publication_datetime   count
0               2018-01  950516
1               2018-02  929795
2               2018-03  998663
3               2018-04  935845
4               2018-05  894903
5               2018-06  876938
6               2018-07  867509
7               2018-08  793283
8               2018-09  858963
9               2018-10  957739
10              2018-11  917355
11              2018-12   38401
process_explain()

Submit an Explain job to the Factiva Snapshots API.

Submits an Explain job to the Factiva Snapshots API, using the same parameters used by submit_explain_job. Then, monitors the job until its status change to JOB_STATE_DONE. Finally, retrieves and stores the results in the property last_explain_job.

Returns

Boolean – otherwise.

Return type

True if the explain processing was successful. An Exception

Examples

Process explain job from snapshot
>>> query_clause = "publication_datetime >= '2018-01-01 00:00:00' AND publication_datetime <= '2018-01-02 00:00:00' AND LOWER(language_code) = 'en'"
>>> my_snapshot = Snapshot(user_key='abcd1234abcd1234abcd1234abcd1234', query=query_clause)
>>> try:
...         my_snapshot.process_explain()
>>> except RuntimeError:
...         print('There was an error with the API call')
>>>
>>> print(my_snapshot.last_explain_job.document_volume)
450483
process_extraction(download_path=None)

Submit an Extraction job to the Factiva Snapshots API.

Submits an Extraction job to the Factiva Snapshots API, using the same parameters used by submit_extraction_job. Then, monitors the job until its status change to JOB_STATE_DONE. The final status is retrieved and stored in the property last_extraction_job, which among other properties, contains the list of files to download. The process then downloads all files to the specified download_path. If no download path is provided, files are stored in a folder named equal to the snapshot_id property. The process ends after all files are downloaded.

Because the whole processing takes places in a single call, it’s expected that the execution of this operation takes several minutes, or even hours.

Parameters

download_path (str, optional) – String containing the file path on where to store the files. If not provided, files are stored in a folder with the same name as the update ID.

Returns

Boolean – otherwise.

Return type

True if the extraction processing was successful. An Exception

Examples

Process extraction job.
>>> query_clause = "publication_datetime >= '2018-01-01 00:00:00' AND publication_datetime <= '2018-01-02 00:00:00' AND LOWER(language_code) = 'en'"
>>> my_snapshot = Snapshot(user_key='abcd1234abcd1234abcd1234abcd1234', query=query_clause)
>>> my_snapshot.process_extraction(path='../downloads/data')
process_update(update_type, download_path=None)

Submit an Update job to the Factiva Snapshots API.

Submits an Update job to the Factiva Snapshots API using the same parameters used by submit_update_job. Then, monitors the job until its status change to JOB_STATE_DONE. The final status is retrieved and stored in the property last_update_job, which among other properties, contains the list of files to download. The process then downloads all files to the specified download_path. If no download path is provided, files are stored in a folder named equal to the last_update_job.job_id property.

Because the whole processing takes places in a single call, it’s expected that the execution of this operation takes several minutes, or even hours.

Parameters
  • update_type (str) – String containing the update type to submit a job. Could be ‘additions’, ‘replacements’ or ‘deletes’.

  • download_path (str, optional) – String containing the file path on where to store the files. If not provided, files are stored in a folder with the same name as the update ID.

Returns

Boolean – otherwise.

Return type

True if the update processing was successful. An Exception

Examples

Process update job with type ‘additions’
>>> previous_snapshot = Snapshot(user_key=my_user, snapshot_id='sdjjekl93j')
>>> previous_snapshot.process_update('additions', download_path=f'./{previous_snapshot.snapshot_id}/additions/')
query = None
submit_analytics_job()

Submit an Analytics job to the Factiva Snapshots API.

Submits an Analytics job to the Factiva Snapshots API, using the assigned user in the user_key, and SnapshotQuery in the query properties.

Returns

Boolean

Return type

True if the submission was successful. An Exception otherwise.

submit_explain_job()

Submit an Explain job to the Factiva Snapshots API.

Submits an Explain job to the Factiva Snapshots API, using the assigned user in the user_key, and SnapshotQuery in the query properties.

Returns

Boolean

Return type

True if the submission was successful. An Exception otherwise.

submit_extraction_job()

Submit an Extraction job to the Factiva Snapshots API.

Submits an Extraction job to the Factiva Snapshots API, using the assigned user in the user_key, and SnapshotQuery in the query properties.

Returns

Boolean

Return type

True if the submission was successful. An Exception otherwise.

submit_update_job(update_type)

Submit an Update Job to the Factiva Snapshots API.

Submits an Update Job to the Factiva Snapshots API, using the assigned user in the user_key and snapshot_id asigned to the instance and the update_type passed as parameter. Assigns the submitted job to the last_update_job property.

Parameters

update_type (str) – String containing the update type to submit a job. Could be ‘additions’, ‘replacements’ or ‘deletes’.

Returns

Boolean

Return type

True if the submission was successful. An Exception otherwise.

Snapshot Jobs

class factiva.news.snapshot.ExplainJob(user_key)

Bases: BulkNewsJob

Represent the operation of creating an explain from Factiva Snapshots API.

document_volume = 0
extraction_type = 'documents'
get_endpoint_url()

Get endpoint URL.

get_job_id(source)

Get job ID.

set_job_data(source)

Set job data.

class factiva.news.snapshot.AnalyticsJob(user_key)

Bases: BulkNewsJob

Represent the operation of creating Analtyics from Factiva Snapshots API.

data = []
get_endpoint_url()

Get endpoint URL.

get_job_id(source)

Get job ID.

set_job_data(source)

Sets job data.

class factiva.news.snapshot.ExtractionJob(snapshot_id=None, user_key=None)

Bases: BulkNewsJob

Class that represents the operation of creating a Snapshot from Factiva Snapshots API.

file_format = ''
files = []
get_endpoint_url()

Obtain endpoint URL.

get_job_id(source)

Obtain Job ID.

process_job(payload=None, path=None)

Override method from parent class to call the method for downloading the files once the snapshot has been completed.

Overrides method from parent class to call the method for downloading the files once the snapshot has been completed.

Parameters
  • payload (str, Optional) – String containg the snapshot instance.

  • path (str, Optional) – String containg the path where to store the snapshots files that are downloaded from the snapshot. If no path is given, the files will be stored in a folder named after the snapshot_id in the current working directory.

set_job_data(source)

Set job data.

class factiva.news.snapshot.UpdateJob(update_type=None, snapshot_id=None, update_id=None, user_key=None)

Bases: ExtractionJob

Represent the Snapshot Updates.

Class that represents the Snapshot Updates. There can be three types of updates: additions, replacements and deletes.

Parameters
  • update_type (str, Optional) – String describing the type of update that this job represents. Requires snapshot_id to be provided as well. Not compatible with update_id

  • snapshot_id (str, Optional) – String containing the id of the snapshot that is being updated. Requires update_type to be provided as well. Not compatible with update_id

  • update_id (str, Optional) – String containing the id of an update job that has been created previously. Both update_type and snapshot_id can be obtained from this value. Not compatible with update_type nor snapshot_id

Raises

- Exception when fields that are not compatible are provided or when not enough parameters are provided to create the job.

get_endpoint_url()

Get endpoint URL.

get_job_id(source)

Get job ID from source.

snapshot_id = None
update_type = None

Snapshot Query

class factiva.news.snapshot.SnapshotQuery(where, includes=None, excludes=None, select_fields=None, limit=0, file_format='avro', frequency='MONTH', date_field='publication_datetime', group_by_source_code=None, group_dimensions=None, top=10)

Bases: BulkNewsQuery

Implement Snapshot Query class definition.

date_field = ''
file_format = ''
frequency = ''
get_analytics_query()

Obtain analytics Query.

get_explain_query()

Obtain Base Query.

get_extraction_query()

Obtain the string querying Factiva.

group_by_source_code = None
group_dimensions = None
limit = 0
top = 0